8 research outputs found

    Results of the Translation Inference Across Dictionaries 2021 Shared Task

    Get PDF
    The objective of the Translation Inference Across Dictionaries (TIAD) shared task is to explore and compare methods and techniques that infer translations indirectly between language pairs, based on other bilingual/multilingual lexicographic resources. In this forth edition the participating systems were asked to generate new translations automatically among three languages - English, French, Portuguese - based on known indirect translations contained in the Apertium RDF graph. Such evaluation pairs have been the same during the three last TIAD editions. The main novelty this time has been the use of a larger graph as a basis to produce the translations, which is the Apertium RDF v2, and the introduction of improved evaluation metrics. The evaluation of the results was carried out by the organisers against manually compiled language pairs of K Dictionaries. For the first time in the TIAD series, some systems beat the proposed baselines. This paper gives an overall description of the shard task, the evaluation data and methodology, and the systems’ results

    Modelling frequency, attestation, and corpus-based information with OntoLex-FrAC

    Get PDF
    OntoLex-Lemon has become a de facto standard for lexical resources in the web of data. This paper provides the first overall description of the emerging OntoLex module for Frequency, Attestations, and Corpus-Based Information (OntoLex-FrAC) that is intended to complement OntoLex-Lemon with the necessary vocabulary to represent major types of information found in or automatically derived from corpora, for applications in both language technology and the language sciences

    Modelling collocations in OntoLex-FrAC

    Get PDF
    Following presentations of frequency and attestations, and embeddings and distributional similarity, this paper introduces the third cornerstone of the emerging OntoLex module for Frequency, Attestation and Corpus-based Information, OntoLex-FrAC. We provide an RDF vocabulary for collocations, established as a consensus over contributions from five different institutions and numerous data sets, with the goal of eliciting feedback from reviewers, workshop audience and the scientific community in preparation of the final consolidation of the OntoLex-FrAC module, whose publication as a W3C community report is foreseen for the end of this year. The novel collocation component of OntoLex-FrAC is described in application to a lexicographic resource and corpus-based collocation scores available from the web, and finally, we demonstrate the capability and genericity of the model by showing how to retrieve and aggregate collocation information by means of SPARQL, and its export to a tabular format, so that it can be easily processed in downstream applications

    Automatic processing of Albanian morphology

    No full text
    Die automatische Sprachverarbeitung hat seit ihren AnfĂ€ngen deutlich an Bedeutung gewonnen. Sie ist heute in einigen Bereichen wie z.B. bei der Suche im Internet unverzichtbar und nicht mehr wegzudenken. Ein Werkzeug fĂŒr die automatische Wortformerkennung und -produktion ist ein grundlegender Baustein fĂŒr viele Anwendungen. Sie kann in vielen Bereichen eingesetzt werden, sowohl als eigenstĂ€ndige Anwendung, z.B. fĂŒr didaktische Zwecke oder zur morphologischen Annotation von Korpora, als auch als unterstĂŒtzende Komponente fĂŒr Anwendungen wie die syntaktische Analyse von Texten. Das hier vorgestellte System ist ein automatisches Werkzeug fĂŒr folgende Aufgabengebiete: Analyse der Rechtschreibung, Lemmatisierung, Annotation der Wortarten, vollstĂ€ndige morphologische Analyse von Wortformen. Das System kann auch im umgekehrten Modus verwendet werden, d.h. Wortformen aus einem gegebenen Lemma und seinen morphologischen Eigenschaften generieren. Das System deckt die Flexion der albanischen Nomina, Verben, Adjektive, Numeralia, Adverbien und Pronomina ab, sowie die nicht flektierenden Wortarten und die hĂ€ufigsten Typen der Wortbildung. Es wurde mit einer Reihe von Testlisten aus unterschiedlichen Quellen getestet. Mit diesen Eigenschaften eröffnet sich fĂŒr das Morphologie-Werkzeug ein breites Spektrum von AnwendungsfĂ€llen in der maschinellen Verarbeitung der albanischen Sprache. Besim Kabashi wurde 1972 in Istog, Kosovo, geboren. Er studierte Linguistische Informatik, Germanistische Linguistik und Informatik an der Friedrich-Alexander-UniversitĂ€t Erlangen-NĂŒrnberg und schloss sein Studium im Jahr 2003 mit dem akademischen Grad Magister Artium (M.A.) erfolgreich ab. Seitdem ist er Wissenschaftlicher Mitarbeiter an der Professur fĂŒr Computerlinguistik bzw. Korpuslinguistik, im Studienfach Linguistische Informatik, wo er sich sowohl mit Lehre als auch Forschung beschĂ€ftigt. Er wurde 2014 im Fachbereich Informatik promoviert und erwarb den akademischen Grad eines Doktors der Ingenieurwissenschaften (Dr.-Ing.). Sein Forschungsgebiet war hauptsĂ€chlich automatische Wortformerkennung und -produktion. Er veröffentlichte sowohl selbststĂ€ndig als auch in Zusammenarbeit mit anderen Verfassern eine Reihe von Artikeln im Bereich Linguistische Informatik ĂŒber automatische Wortformerkennung und -produktion, Computerlexikographie, Wissensressourcen, Korpuslinguistik und Statistik.Today, natural language processing is essential and indispensable in many areas, e.g. when searching the Internet. One important element for many applications in that area is a tool for the automatic recognition and production of word forms. Such a tool can be used in many areas, either as a standalone application, e.g. for didactic purposes or for the morphological annotation of corpora, or as a component in larger systems, e.g. for the syntactic analysis of texts. The system presented here is an automatic tool for the following tasks: analysis of word spelling, lemmatization, POS-tagging, full morphological analysis of word forms. The system can also be used in reverse mode, i.e. to generate word forms from a given lemma and its morphological attributes. The system covers the inflection of Albanian nouns, verbs, adjectives, numerals, adverbs and pronouns as well as the non-inflectional parts of speech and the most frequent types of word formation. It has been tested against several test lists compiled from a variety of sources. With these attributes, the morphology tool is suited for a wide range of use cases in Albanian natural language processing. Besim Kabashi was born in Istog, Kosovo, in 1972. He studied Natural Language Processing, German Linguistics and Computer Science at the Friedrich-Alexander-University of Erlangen-Nuremberg, Germany. In 2003 he received his Magister Artium (M.A.) degree. Since that time he has been working as a researcher at the Professorship for Computational Linguistics respectively Corpus Linguistics, where he has been teaching and pursuing his research. In 2014 he received his Ph.D. degree in Computer Science from the Friedrich-Alexander-University of Erlangen-Nuremberg. His main areas of research have been automatic word recognition and production. He has authored and co-authored papers on a variety of topics in Natural Language Processing, including automatic word recognition and production, computational lexicography, knowledge resources, corpus linguistics and statistics

    SentiKLUE: Updating a Polarity Classifier in 48 Hours

    No full text
    SentiKLUE is an update of the KLUE po-larity classifier – which achieved good and robust results in SemEval-2013 with a sim-ple feature set – implemented in 48 hours.
    corecore